Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Add a validation API to DeleteFiles which validates files exist prior to attempting to deletion. #8525

Merged

Conversation

amogh-jahagirdar
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar commented Sep 8, 2023

Currently there is no validation on delete files API for verifying that the file to be deleted actually exists prior to the commit. This can cause unexpected behavior, for example:

1.) Rewrite Data files compacts FILE_A + some delete files
2.) Concurrently a DeleteFiles call was done for FILE_A.
3.) Deletion gets retried after 1 completes. At the point it retries, FILE_A no longer exists due to the compaction so the deletion is a no-op.

From a user's perspective when they go and query the table after 3, they'll still see the contents of FILE_A, even though the wouldn't expect it since they received a successful delete of FILE_A.

This change currently adds a new validation API as opposed to changing the behavior to always do strict validation for purpose of backwards compatibility. I wanted to open this up and discuss with the community to see about if we want the new API or if we just want to change this behavior since it's mainly just used by maintenance procedures and I'm not sure where the current behavior would be desired.

Also I'd need to compare what serializable isolation vs snapshot isolation should guarantee in this concurrent delete scenario.

@rdblue rdblue added this to the Iceberg 1.4.0 milestone Sep 19, 2023
@amogh-jahagirdar amogh-jahagirdar force-pushed the fix-delete-validation branch 2 times, most recently from ac725f3 to 77fbb4a Compare September 20, 2023 22:10
@amogh-jahagirdar amogh-jahagirdar marked this pull request as ready for review September 20, 2023 22:11
Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I think it's good to merge when @RussellSpitzer's concern has been addressed. I think that the test has been rewritten, but you guys can confirm.

prior to attempting to deletion.

Simplify/improve the validation check

Use failMissingDeletePaths, more simplification
@amogh-jahagirdar
Copy link
Contributor Author

Thanks all for the patient reviews, merging!

@amogh-jahagirdar amogh-jahagirdar merged commit f826cf4 into apache:master Sep 21, 2023
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants